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Abstract 

The multi-criteria decision making, which is possible 
with the advent of skyline queries, has been applied in 
many areas. Though most of the existing research is 
concerned with only a single relation, several real world 
applications require finding the skyline set of records over 
multiple relations. Consequently, the join operation over 
skylines where the preferences are local to each relation, 
has been proposed. In many of those cases, however, 
the join often involves performing aggregate operations 
among some of the attributes from the different relations. 
In this paper, we introduce such queries as "aggregate 
skyline join queries". Since the naive algorithm is 
impractical, we propose three algorithms to efficiently 
process such queries. The algorithms utilize certain 
properties of skyline sets, and processes the skylines 
as much as possible locally before computing the join. 
Experiments with real and synthetic datasets exhibit the 
practicality and scalability of the algorithms with respect 
to the cardinality and dimensionality of the relations. 
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1 Introduction 

The skyline operator, introduced by Borzsonyi et al. j2), 
addresses the problem of multi-criteria decision making 
where there is no clear preference function over the at- 
tributes, and the user wants an overall big picture of which 
objects dominate (equivalently, are better than) other ob- 
jects in terms of preferences set by her. The classic ex- 
ample involves choosing hotels that are good in terms of 
both price and distance to beach. The skyline set of hotels 
discard other hotels that are both dearer and farther than a 
skyline hotel. 



For every attribute, there is a preference function that 
states which objects dominate over other objects. For ex- 
ample, the preference function for both price and distance 
to beach is <, i.e., a hotel with a lower price and at a closer 
distance to the beach than another hotel will dominate the 
second one. Consequently, the second hotel is never going 
to be preferred, and does not require any further consider- 
ation. The skyline query returns all such objects that are 
not dominated by any other object. The importance and 
usefulness of skyline queries has provoked the commer- 
cial database management systems to incorporate these 
queries into existing systems [3 1. 

In real applications, however, there often exists a sce- 
nario when a single relation is not sufficient for the appli- 
cation, and the skyline needs to be computed over multiple 
relations [ 16 1. For example, consider a flight database. A 
person traveling from city A to city B may use stopovers, 
but may still be interested in flights that are cheaper, have 
a less overall journey time, better ratings and more ameni- 
ties. In this case, a single relation specifying all direct 
flights from A to B may not suffice or may not even exist. 
The join of multiple relations consisting of flights start- 
ing from A and those ending at B needs to be processed 
before computing the preferences. 

The above problem becomes even more complex if the 
person is interested in the travel plan that optimizes both 
on the total cost as well as the total journey time for the 
two flights (other than the ratings and amenities of each 
airline). In essence, the skyline now needs to be com- 
puted on attributes that have been aggregated from mul- 
tiple relations in addition to attributes whose preferences 
are local within each relation. The common aggregate op- 
erations are sum, average, minimum, maximum, etc. 

Table[T]shows an example. The first table lists all flights 
from city A and the second one lists all flights to city B. 
A join of the two tables with the destination of the first 
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table equal to the source of the second table and departure 
time more than arrival time will yield all flights from A to 
B with one stopover. As shown in Table QI C X it also con- 
tains the total cost, total journey time, ratings and ameni- 
ties of the two flights. The user wants a skyline on this 
joined relation using these attributes. While the total cost 
and total journey time are aggregated attributes, the rat- 
ings and amenities are local to each table. In this exam- 
ple, flight (13, 23) is dominated by flight (11, 21) in all 
the attributes, and hence, will not be preferred. On the 
other hand, flight (11, 21) is not dominated by any other 
flight and, therefore, is part of the skyline set that the user 
wants to examine it more thoroughly. We name the above 
queries that retrieve skylines over aggregates of attributes 
joined using multiple relations as AGGREGATE SKYLINE 
Join Queries (ASJQ). 

The above query can be specified in SQL as: 

SELECT fl.fno, f2.fno, 

fl.dst, f2.src, fl.arr, f2.dep, 
fl.rtg, f2.rtg, fl.amn, f2.amn, 
cost as fl.cost + f2.cost, 
duration as f 1 . duration + f 2 . duration 
FROM FlightsA as fl, FlightsB as f2 
WHERE fl.dst = f2.src AND 
f 1 . arr < f 2 . dep AND 
SKYLINE of cost min, duration min, 
fl.rtg max, f2.rtg max, 
fl.amn max, f2.amn max 

Thus, database systems that have the skyline operator 
built into them [5 | can easily allow the users to run such 
queries. 

The preferences in the general skyline join problems 
are local to each relation, and hence, the skyline opera- 
tions can be performed before the join [16|. For ASJQ 
queries, however, the skyline is computed over the ag- 
gregate values of attributes from multiple relations. This 
leads to performance degradation, since, the cardinality of 
joined relations is in general large, and the skylines can- 
not be processed unless the aggregate values have been 
computed. The aggregation function must be monotonic, 
i.e., if values s and u are preferred over values t and v 
respectively, the aggregated value of s and u must be 
preferred over the aggregated value of t and v. The 
aggregation operation is reminiscent of the problem of 
finding top-k objects using multiple sources [6j. How- 
ever, the ASJQ queries differ significantly by retrieving 
the skylines in which the aggregate values are only part 
of the attribute set. ASJQ queries, thus, involve three 
separate problems — skyline queries, join and aggregation 
from multiple sources — together, and highlights the con- 
nections among them. 



The ASJQ queries are pertinent in many application do- 
mains. For example, the situation with flights described 
above is quite a routine task for tour planners and travel- 
ing salespersons. Another interesting application is in the 
cricket leagues. Clubs want to buy both good batsmen and 
good bowlers. Batsmen have attributes such as average, 
cost and rating. Similarly, bowlers have strike rate, cost 
and rating. The clubs optimize their chances of winning 
by considering options from the skyline set of batsman- 
bowler combinations with preferences for high average, 
high strike rate, low total cost and high total rating. In 
the same way, to choose an optimal combination of dig- 
ital camera and a compatible memory card from a prod- 
ucts database, it is necessary to join the individual tuples 
containing the attributes of a camera and those of a mem- 
ory card on an attribute such as compatible memory card 
type (e.g., SD, XD, CF etc.), and optimize an aggregate 
attribute such as total cost, in addition to local attributes 
such as optical zoom (for camera) and storage capacity 
(for memory card). ASJQ queries can also be applied in 
the context of multimedia data retrieval |6|, geographic 
information systems [8j, dynamic resource allocation on 
the grid fl2l . e-commerce fl5l . etc. 

The naive method of implementing ASJQ involves 
three steps: (i) performing the join operation over the re- 
lations, (ii) performing the aggregate operations on the at- 
tributes of multiple relations, and (iii) performing the sky- 
line query on the joined relation. For large relations, this 
demands impractical computational costs. By intuition, 
one can observe that non-skyline points in each relation 
cannot appear in the final result set. Hence, performing a 
skyline operation on each relation before joining reduces 
the size of the relations to be joined and, thus, reduces the 
processing cost. 

To reduce the costs further, we designed three algo- 
rithms. The first approach, Multiple Skyline Computa- 
tions (MSC) algorithm, utilizes the fact that certain joins 
of non-skyline sets from the individual relations need not 
be tested for skyline criteria, and can be pruned. The 
Dominator-based algorithm and the Iterative algorithm 
improve on the MSC approach by pruning records even 
from the skyline sets of individual relations before they 
are joined, and are thus more efficient. 

Our contributions in this paper are: 

1. We define a novel query "Aggregate Skyline Join 
Query". 

2. We propose three algorithms that efficiently solves 
them. 

3. We thoroughly investigate the effects of different pa- 
rameters on the algorithms in terms of computational 
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(a) Flights from city A (FlightsA) (b) Flights to city B (FlightsB) 
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Table 1: Example of an Aggregate Skyline Join Query (ASJQ). 



costs both analytically and through experiments. 

The rest of the paper is organized as follows. The Ag- 
gregate Skyline Join Query (ASJQ) is formally defined 
in Section [2] A brief literature review is presented in 
Section [3] Section H proposes and analyzes three algo- 
rithms that efficiently solves the ASJQ queries. Section|5] 
describes the experimental results before Section [6] con- 
cludes. 

2 Problem Statement 

We begin by recapitulating the definition of skyline 
queries for a relation. Certain attributes of the relation 
participate in the skyline and are called the skyline at- 
tributes. For each skyline attribute, preference functions 
are specified as part of the skyline query. In a relation 
R, a tuple r*j = (rjj , rj 2 , . . . , Ti h ) dominates another tu- 
ple rj = (rj 1 , r j2 , . . . , rj k ), denoted by r$ y rj, if for 
all skyline attributes c = {si, . . . , s^/} C {1, . . . , k}, ri c 
is preferred over or equal to r 3 - c , and there is at least one 
attribute d where r id is strictly preferred over r Jd . A tuple 
r is in the skyline set of R if there does not exist any tuple 
s £ R that dominates r. 

For our problem, i.e., ASJQ, the attributes of a relation 
are categorized into three types: (i) local (L): attributes 
on which skyline preferences are applied locally to each 
relation, (ii) aggregate (G): attributes on which skyline 



preferences are applied after the aggregate operations are 
performed during join, (iii) join (H): attributes on which 
no skyline preferences are specified, but are instead used 
for joining the two relations. 

Definition 1 (Local attributes). The attributes of a rela- 
tion on which preferences are applied for the purposes of 
skyline computation, but no aggregate operation with an 
attribute from the other relation is performed, are denoted 
as local attributes. 

Definition 2 (Aggregate attributes). The attributes of a 
relation, on which an aggregate operation is performed 
with another attribute from the other relation, and then 
preferences are applied on the aggregated value for sky- 
line computation, are denoted as aggregate attributes. 

Definition 3 (Join attributes). The attributes of a relation, 
on which no skyline preferences are specified, but are used 
to specify the join conditions between the two relations, 
are denoted as join attributes. 

Denoting the local attributes by I, the aggregate at- 
tributes by g, and the join attributes by h, the two relations 
can be represented as: 

Ri = {hi 1 , . . . , hi j , h lt . . . , h mi , gi 1 , . . . , gi n } 

R2 — {/i2i , • • • 1 hij , hi , ■ ■ ■ , h m2 , 92i , ■ ■ ■ , 92 n } 

where Ri and R2 has mi and 7712 local attributes respec- 
tively, and n aggregate attributes. The join condition is a 
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conjunction of j comparisons between the corresponding 
j attributes (/ly) of A and B. In this paper, we assume 
that join attributes are separate from local and aggregate 
attributes. The final joined relation R = Ri N i?2 is 

R = {hi 1 , . . . , h\ j , h<x x , . . . , , 

l\ x , . . . , tl mi , ) • ■ • ) ^2 m2 i 
5li ©l32i,---,fll„ ®n.92„} 

where ©j, etc. denote the join condition. 

For the example in Table Q] the local attributes are 
amn and rtg, the aggregate attributes are cost and 
duration, and the join attributes are dst and arr for 
Flights A, and src and dep for FlightsB. 

The Aggregate Skyline Join Query (ASJQ) is 
defined as: 

Definition 4 (Aggregate Skyline Join Queries (ASJQ)). 
The ASJQ queries retrieve the skyline set from the joined 
relation according to the preference functions of its mi + 
77i2 local and n aggregate attributes. 

Dominance relationships between records can be de- 
fined based on the attributes on which a record dominates 
other records. A tuple r in relation Ri fully dominates an- 
other tuple s € Ri if r dominates s in both the local and 
the aggregate attributes of Ri. If r dominates s only in the 
local attributes, it is said to locally dominate s. 

The above definitions assume that whenever a tuple 
t' = u X v' exists in the final relation, the tuple t = 
u m v, where v' >- v, also exists. However, the join at- 
tributes of v' and v may be such that only v' satisfies the 
join condition with u, but v does not. Consider flight 15 in 
Table Q] It is dominated by flight 16 in the local attributes. 
However, since they have different destinations, 15 can 
join with other flights originating from C (e.g., 23) which 
flight 16 cannot. Hence, it must not be considered to be 
dominated by flight 16. In such cases, t 1 may also exist as 
a skyline in the final result as there is no t to dominate it. 
The problem is that the local dominance did not take into 
account the join attributes. 

In order to handle this, the join attributes must be taken 
into account when full and local dominance relationships 
are defined. Suppose, the join condition that two join at- 
tributes a from A and b from B participate in is A.a QB.b 
where may be any of the following five comparison op- 
erators: =, <, <, >, > (we do not consider other opera- 
tions in this paper). 

Now, consider the tuple v! G A. If it is dominated by 
tuple u G A, then it must be ensured that whenever v! 
joins with v G B, u must also satisfy the joining condi- 
tion, i.e., if u' ix v is true, then u m v must be true as 
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Table 2: Converting join conditions to skyline prefer- 
ences. 

well. For example, if denotes =, then this translates 
to u.a = u'.a (both being equal to v. b); if denotes <, 
this translates to u.a < u'.a, and similarly for the rest. 
(The comparison conditions are reversed for relation B.) 
This condition can be incorporated in the skyline finding 
routines as follows. 

The join attribute is also considered as a skyline at- 
tribute with the preference function set appropriately as 
summarized in Table [2] This automatically ensures that 
whenever a tuple u' is dominated by u, u' can be discarded 
as the join of u with v can always be formed which will 
ultimately dominate the join of u' with v. 

Based on the above discussion, the definitions of dom- 
inance relationships are modified as follows. 

Definition 5 (Full dominance). A tuple r in relation R 
fully dominates a tuple s if r dominates s in local, aggre- 
gate and join attributes of R. 

Definition 6 (Local dominance). A tuple r in relation R 
locally dominates a tuple s if r dominates s in local and 
join attributes of R. 

henceforth, whenever we mention local or aggregate at- 
tributes in the context of dominance, we assume that the 
join attributes are incorporated within them. 

Note that full dominance implies local dominance, but 
not vice versa. The corresponding definitions of full dom- 
inator and local dominator are also specified. Using these 
definitions, two kinds of skyline sets are also defined. A 
tuple r in relation R is in the full skyline set if no tuple in 
R fully dominates r, and it is in the local skyline set if no 
tuple in R locally dominates it. A tuple that is in the local 
skyline set is also in the full skyline set, but not vice versa. 

3 Related Work 

The maximum vector problem or Pareto curve ifTTl in 
the field of computational geometry has been imported 
to databases forming the skyline query fl2). After the first 
skyline algorithm proposed by Kung et al. [11], there were 
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many algorithms devised by exploring the properties of 
skylines. Some representative non-indexed algorithms are 
SFS ID, LESS 0. Using index structures, algorithms 
such as NN OH and BBS 03 have been proposed. 

In [9 1, Jin et al. proposed the multi relational skyline 
operator. They also designed algorithms to find such sky- 
lines over multiple relations. In [16], Sun et al. coined the 
term "skyline join" in the context of distributed environ- 
ments. They extended SaLSa 1 1 1 and also proposed an it- 
erative algorithm that prunes the search space in each step. 
AS JQ queries differ in that it extends the skyline join pro- 
posed in [9] with aggregate operations performed during 
the join. This renders the use of the existing techniques 
inapplicable as they work only on the local attributes. 

There are various algorithms for joining such as nested- 
loop join, indexed nested-loop join, merge-join and hash- 
join 1 14| . Nested-loop joins can be used regardless of the 
join condition. The other join techniques are more effi- 
cient, but can handle only simple join conditions, such as 
natural joins or equi-joins. Any of these join algorithms 
that is applicable for the given query can be used with 
ASJQ algorithms. 

4 Algorithms 

In this section, we describe the various algorithms that 
have been designed to process the ASJQ queries. We 
start with the naive one before moving on to the more so- 
phisticated algorithm that uses the multiple skyline com- 
putations (MSC) approach. The last two algorithms — 
dominator-based and iterative — improves upon the MSC 
approach. For each algorithm, we also provide an analysis 
of its computation cost. 

The pseudocode of the algorithms assume the pro- 
cedures for computeFullSkyline, computeLocalSkyline, 
computeJoin, and aggregate methods. The algorithms for 
these methods are not shown, since any efficient skyline or 
join algorithm can be plugged into these methods. The ag- 
gregate method simply computes the aggregate operations 
on the specified attributes. Even though the efficiency of 
the entire method depends on the complexities of these 
algorithms, we have not experimented with them as the 
focus of this paper is on processing the ASJQ part. 

4.1 Naive Algorithm 

The naive method of processing ASJQ queries is shown 
in AlgorithmQ] It computes the join of the two input rela- 
tions and applies the aggregate operations, before comput- 
ing the skyline on the joined and aggregated relation using 



Algorithm 1 Naive Algorithm 

Input: Relations A, B, preferences p, aggregate opera- 
tions a 

Output: Aggregate skyline join relation S 
1: J computeJoin(A, B) 
2: R <r- Aggregate( J, a) 
3: S <— computeFullSkyline(i?,p) 



the preferences. There are two costs involved in this algo- 
rithm, joining cost and cost for the skyline computation. 
The cost of aggregation is not included, because it can be 
done when two tuples are joined, without any extra cost. 

4.1.1 Analysis 

We denote the cost of a skyline operation on a relation of 
N tuples having a attributes by S(N, a). The cost of a join 
operation on two relations of size Ni and N2 is denoted 
by J(N±, N2). Since the aggregate operations are done as 
part of the join, the cost of those operations are not taken 
into account separately. Rather, if g attributes are aggre- 
gated, the cost of the join is denoted by J(Ni , N2, g), by 
incorporating the parameter within it. 

Assuming the relations A and B contain Na and Nb 
tuples respectively with n aggregate attributes, the cost 
of joining and aggregation is J(Na, Nb,ti). The joined 
relation contains at most NaNb tuples, each having 
m% + 1712 + n attributes, and therefore, the cost of sky- 
line operation is S(NaNb, mi + m,2 + n). 

When operating on large relations, the above costs are 
impractical. However, an advantage of the algorithm, 
apart from being the simplest to implement, is the fact 
that it is independent of the distribution of the data. 

4.2 Performing Skylines before Join 

Processing ASJQ queries can be made more efficient by 
pushing the join operation after the full skylines have been 
evaluated in each relation, thereby discarding tuples that 
are fully dominated by other tuples. These records are 
guaranteed not to exist in the ASJQ result set. 

Denoting the full skyline sets in each relation by Aq 
and B respectively, and the non-skyline sets by A' and 
B' Q respectively, i.e., A' Q = A — A and B' Q = B — B , the 
following theorem shows that any tuple formed by joining 
a tuple from either A' Q or B' or both cannot be part of the 
final skyline set. 

Theorem 1. A tuple formed by joining a tuple that is not 
a full skyline in the individual relation never exists in the 
final skyline set. 
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Proof. Consider a tuple t' G A m B' formed by joining 
a tuple u E Ao with a tuple v' G B' Q . Since v' is not a full 
skyline, there exists a tuple v € Bo that fully dominates 
v'. Consider the tuple t = u N v. The attributes in Zi 
of i are equal to those of t', but dominate in l 2 . Consider 
an aggregate attribute g\ = g\ i ©j g' 2 of t'. The corre- 
sponding attribute value for t is g^ = g\ i ©j (72; • Since 
dominates g 2 . and ffij is a monotone aggregate function, 
<7i dominates g^. Hence, overall, the tuple i dominates t' . 
Thus, f ' cannot be part of the skyline. 

Similarly, any tuple in A' m B or A' x B' is domi- 
nated by tuples formed by joining the corresponding dom- 
inators, and will never exist in the final skyline set. □ 

As an example, consider flights 11 and 17. Flight 11 
fully dominates flight 17 and satisfies the conditions for 
the join attributes as well. This ensures that any other 
flight joined with 17 (e.g., 21) can be joined with 11 as 
well, and the resulting joined tuple (11, 21) will surely 
dominate (17, 21). Hence, flight 17 need not be consid- 
ered any further. On the other hand, even though flight 24 
dominates flight 26 in the local and aggregate attributes, 
the join attributes are not compatible as the sources of the 
flights are different. Hence, a tuple joined with 26 will 
not be dominated by that joined by 24 as the latter tuple is 
invalid according to the join criteria. 

Thus, following the above theorem, the tuples from the 
sets A' and B' can be discarded. The remaining tuples 
may or may not exist in the final result set. For example, 
consider flight 23 in the second relation. It joins with three 
tuples from the first relation as shown in Table [TJc). Of 
these, (11, 23) exists in the final skyline set while (13, 23) 
and (15, 23) do not as they are dominated by (1 1, 23). 

However, not all possible joined tuples from Ao and Bo 
need to be examined. Each full skyline set can be further 
divided by extracting the local skylines from them. Sup- 
pose, the local skyline sets for Ao and Bo be A\ and B\ 
respectively. Correspondingly, let A[ and B[ be the set 
of non-skyline points within Ao and Bo respectively, i.e., 
they are full skylines but not local skylines. Mathemati- 
cally, A\ = Ao- A x and B' x = B - B x . The following 
theorem shows that any tuple formed by joining a tuple 
from either A\ or B\ or both must be part of the final sky- 



line set. 

Theorem 2. The tuples formed by joining either or both 
of the tuples which are local skylines in the individual re- 
lations must exist in the final skyline set. 

Proof. Consider a tuple t G Ai m B[ formed by joining 
a tuple u G A\ with a tuple v' G B[. Since u is a local 
skyline, there exists no tuple u' G A that locally (and 
therefore, fully) dominates u. Thus, for any other joined 
tuple t' G A M B, t' cannot have local attributes of A that 
dominate over t. Thus, t must be part of the skyline. 

Similarly, any tuple in A[ n B\ or A\ x B\ is not 
dominated by any other tuple in all the attributes, and will 
therefore, always exist in the final skyline set. □ 

Consider flight 1 1 in the first relation and 21 in the sec- 
ond relation. Both are local skylines in the corresponding 
full skyline sets, i.e., they are part of Ao and Bo respec- 
tively. Any tuple joined with 1 1 (e.g., 23) must be part of 
the final skyline as no other tuple can dominate (11, 23) in 
the local attributes of the first relation, i.e., f 1 . amn and 
f 1 . rtg. 

However, nothing can be concluded directly about the 
tuples formed by joining A[ with B[ — they may or may 
not exist in the ASJQ result set. Though their local at- 
tributes will be dominated, their aggregate attributes may 
be better, and therefore, they may be part of the skyline. 
Consider the joined tuple (13, 23). It is dominated by (1 1, 
21) even in the aggregate attributes, and is, hence, not a 
skyline. On the other hand, the tuple (14, 24) is a sky- 
line, even though 14 is locally dominated by 1 1 and 24 by 
21; however, the aggregate attributes of (14, 24) are more 
preferable. Hence, the tuples in A[ m B[ needs to be 
processed to determine the ASJQ records in it. 

The ASJQ algorithms utilize Theorem[T]and Theorem|2] 
to reduce the processing by first determining the skyline 
sets before joining. 

In addition to the high processing costs, the naive algo- 
rithm suffers from the problem of non-progressive result 
generation, i.e., it presents the results only after complete 
processing of the algorithms. In real applications with 
large datasets, query processing may take a lot of time, 
and this large response time, even for the first result, may 
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Algorithm 2 MSC Algorithm 

Input: Relations A, B, preferences p, aggregate opera- 
tions a 

Output: Aggregate skyline join relation S 
1: Aq <— computeFullSkyline(j4) 
2: Bq <— computeFullSkyline(_B) 
3: (Ai,A[) <— computeLocalSkyline(A ) 
4: (Bi,B[) <— computeLocalSkyline(_Bo) 
5: J <— computeJoin(Ai, Bi) U compute Join ( Ai, B[) 

U computeJoin(A' 1; Bi) 
6: R <— Aggregate( J, a) 
1: J 1 <- computeJoin(A' 1 , B[) 
8: R' 4- Aggregate (J', a) 

9: S <- R U computeFullSkyline(i?', R) I* finds sky- 
line points in R 1 by treating the current skyline as R 

*/ 



be undesirable for many users. This can be handled by 
devising online algorithms that generate a subset of the 
full results quickly and progressively generates the tuples 
thereafter. Though the full results are still output only af- 
ter complete processing, these can be used in real-time 
applications. 

MSC and the next set of algorithms achieve this by gen- 
erating tuples that are sure to be in the final skyline set 
without processing all the tuples in the joined relation. 

4.3 Multiple Skyline Computations (MSC) 
Algorithm 

The Multiple Skyline Computations (MSC) algorithm 
uses the results of the above two theorems, and imme- 
diately outputs the tuples in A\ X B\, A\ X B[, and 
A' x n B\ . It then examines the tuples from A[ x B[ to 
determine whether they are part of the final skyline set. 
Algorithm[2] shows the complete algorithm. 

Moreover, processing the joined relation, which is 
generally large, constitutes most of the processing cost. 
Hence, algorithms that reduce the number of comparisons 
in the joined relation without processing the whole rela- 
tion improves the efficiency of ASJQ processing. 

Table [3] and Table @] respectively show the division of 
the sets A and B from Table Q] into the different cate- 
gories. The naive algorithm finds the skyline by exam- 
ining 11 joined tuples. Theorem Q] reduces the number of 
joined tuples to 6 (as shown in Table [TJc)). By applying 
TheoremlH the MSC algorithm reduces it further by com- 
puting the sets A[ and B[. The total number of tuples in 
A[ n B[ on which the final skyline needs to be computed 
is only 3. 



L G 







A 2 








A'o 



Figure 1 : Break-up of skyline sets for iterative algorithm. 
4.3.1 Analysis 

We next analyze the costs of the MSC algorithm. Us- 
ing the same notation as in the analysis of the naive algo- 
rithm, the first two full skyline computations has a cost of 
S(NA,j + mi + n) + S(NB,j + m 2 + n), where nc de- 
notes the cardinality of the set C. The cost of computing 
the local skylines next are S(Na , mi) + S (JVbq ? m 2 ) . 

The total cost of computing the three joins, Ai ix 
Bi, Ai M B[, and A[ X B±, is J(A u Bi,n) + 
J(Ai, B[,n) + J(A[, Bi, n). The full skyline operator 
is applied on the tuples from A[ x B[, thereby incurring 
a cost of at most S(N J i' 1 -Ng' , mi + m 2 + n). 

The MSC algorithm performs significantly better than 
the naive one when the cardinality of the full skyline set 
is low but that of the local skyline sets is high. A number 
of skyline tuples can be generated quickly and only a few 
ones (those in A[ x B[) require a complete investigation. 
Since the skylines are computed locally, the number of 
local attributes plays a big role. With more number of 
local attributes, the size of Ai (Bi) grows. However, in 
that case, the cardinality of Aq (Bq), and hence, that of 
A[ {B'i) will be large as well, thereby reducing some of 
the benefits of the MSC algorithm. Section|5]analyzes the 
effect of these parameters. 

4.4 Dominator-Based Approach 

In order to further reduce the processing cost of tuples 
from A'i M B[, the following two algorithms are de- 
signed. The first algorithm makes use of dominator prop- 
erties among the tuples and prunes away unnecessary 
comparisons while determining the ASJQ records within 
the set A[ X B[. 

Consider a tuple t 1 E A\ x B[ formed by joining tu- 
ples u' £ A[ and v' e B[, i.e., t 1 = u' n v'. The tuple t' 
can be dominated by only certain records of the skyline set 
(Ai X Bi) U (A'i x Bi)U(Ai x B'i). Identifying these 
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Algorithm 3 Dominator-Based Algorithm 

Input: Relations A, B, local preferences I, preferences p, 

aggregate operations a 
Output: Aggregate skyline join relation S 
1: Aq <— computeFullSkyline(yl) 
2: Bq <— computeFullSkyline(_B) 
3: (Ai,A[) <— computeLocalSkyline(ylo) 
4: (Bi,B[) <— computeLocalSkyline(_Bo) 
5: (Ai.A'^Da) <- findLocalDominators(A , I) I* 

using Algorithmic*/ 
6: (B 1 ,B[,D B ) <- findLocalDominators(S ,0 /* 

using Algorithm!!]*/ 
7: J <— computeJoin(Ai, B x ) U compute Join (Ax, B[) 

U computeJoin(A' 1 , Bi) 
8: R <- Aggregate( J, a) 
9: J' <- compute Join (A[, B[) 
10: R' <- Aggregate (J 1 , a) 

11: S R U computeSkylineUsingDominators(i?', Da, 
/* finds skyline points in R' by using dominator sets 
D A ,D B (Algorithmic */ 



Algorithm 4 Skyline Computation and Finding Domina- 

tors 

Input: Relation Aq, local preferences p 
Output: Skyline set A±, Non-skyline set A[, Dominator 
set Di 

l: while r' <— readRecord(A ) do 
2: flag <- 

3: while r <— readRecord(A ) do 
4: if r locally dominates r' using preferences p 
then 

- D(r')Ur 
1 



10 

n 

13 
14 
15: 
16 



W) 

flag «• 
end if 
end while 
if flag = then 

Ai <r- Ai U r' 
else 

A[ 
D x 
end if 
end while 

S^(A u A[,Di) 



A[ U r' 
Di UD(r') 



records avoids comparing with the whole sets. Suppose, 
the local dominators of v! (V) are represented by ld(u') 
(ld(v')). The following lemma proves that t' can be dom- 
inated by only those tuples t that are in ld(v!) n ld(v'), 
and nothing else. 

Lemma 1. A tuple t' = u' n v' in A[ n B[ can be 
dominated by only those tuples that are formed by join- 
ing tuples in the local dominator sets of u 1 and v', i.e., in 
ld(u') N ld(v'). 

Proof. Consider a tuple t' = u' M v' € A\ N B[. Also, 
consider u which is not a local dominator of u', i.e., u ^ 
ld(u'), and a tuple £ formed by joining u with any v <E B. 
The local attributes l\ of f' are not dominated by those 
in t as then u' would have been dominated by u. Thus, t 
cannot dominate t' . Similarly, any t formed by joining any 
u with v ^ ld(v') cannot dominate t' as the local attributes 
of the second relation will not be dominated. Hence, if t' 
can only be dominated by t G ld(v!) n ld(v'). □ 

The records in ld(v!) m ld(v') are not guaranteed to 
dominate t' though. This is due to the fact that u' con- 
tains aggregate attributes that are not dominated by those 
of ld(u') (the reason being v! belonging to the set Aq, 
i.e., it is a full skyline). Hence, the tuple t' may need to 
be compared with all the tuples in ld(u') x ld(v'). This 
reduces the computation cost of the last step of the MSC 
algorithm significantly as it is not compared with all tu- 
ples of (Ai n Bi) U (A[ n Bi) U (Ai X B[). 



However, the previous steps perform more work by 
finding the dominator sets for each tuple not in the lo- 
cal skyline set. In other words, by increasing the cost 
of the MSC step to draw some conclusions among the 
records, the overall computational cost is reduced by uti- 
lizing those properties in the latter stages of the algorithm. 

Algorithm |3] summarizes the approach. It uses Algo- 
rithm |4] to find the local dominator sets for each record 
that is in A\ (and B'j). Algorithm [5] shows the subrou- 
tine that utilizes these local dominator sets to determine 
whether a tuple is in the final skyline set. 

In the example in TableQ] flight 13 is locally dominated 
by flights 11 and 12 while flight 23 is locally dominated 
by flights 21 and 22. Therefore, to determine whether tu- 
ple (13, 23) is a skyline in the ASJQ set, it needs to be 
checked only against (11, 21). (The other combinations 
do not generate valid joined tuples.) This is a large im- 
provement as opposed to the MSC algorithm that checks 
(13, 23) against 5 joined tuples from (At m Si) U (A' x tx 
Bx) U (Ax x B[). 

4.4.1 Analysis 

We now analyze the costs of the dominator-based al- 
gorithm with respect to the MSC algorithm. The first 
two full skyline computations has the same cost of 
S(Na,itlx + n) + S(Nb,iti2 + n). The local skylines 
are computed next having a total cost of S(Na , m x ) + 
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Algorithm 5 Skyline Computation using Dominators 

Input: Non-skyline set R', Dominator sets Da,Db, 

preferences p, aggregate operations a 
Output: Skyline set R 
1: while r' <— readRecord(i?') do 

2: t' i — U M V 
3: flag <- 

4: while d,A <— readDominator(r', Da) do 

5: /* read record from Da that locally dominates u 

*/ 

6: while d,B 4— readDominator(r', Db) do 

7: /* read record from D b that locally dominates 

v */ 

8: r <— Aggregate(d J 4 n d B ,a) /* read full 

record from R that has <1a and ds */ 

9: if r fully dominates r' according to prefer- 

ences p then 



discard r' 
flag <- 1 
break 
end if 
end while 
if flag = 1 then 

break 
end if 
end while 
if flag = Othen 
R^ RUr' 
end if 
end while 



S(N Bo ,m 2 ). 

In addition to the skyline computations, the dominator 
sets are computed. Denoting the cost of dominator com- 
putation by D, the cost is D(N Ao ) + D(N Bo )- Note that 
even though the dominators for only A[ and B[ tuples are 
maintained, all the tuples of A and B need to be pro- 
cessed. Suppose, the size of the dominator sets are cLa 1 
and d B < respectively. 

The skyline operator is next applied on the tuples from 
A[ n B[ using the dominators found in the previous 
step. This cost is at most SD(Na , 1 -N B ' , x °^b; > n )- 
Note that the dimensionality of the skyline operation us- 
ing dominators here is only n, i.e., only the aggregate at- 
tributes need to be checked for dominance, as the local 
attributes are, by definition, dominated by the local domi- 
nators. 

Finally, the total cost of computing the three other joins, 
A\ m B\, A\ N B[, and A\ m B\, is the same 
as that of the MSC algorithm, and can be denoted by 



J(A x ,B x ,n) + J(A u B' x ,n) + J(A' x ,B x ,n). 

The dominator-based algorithm thus performs well 
when the dominator sets are small. Otherwise, the over- 
head of dominator computation may be too large to gain 
any speedup over the MSC algorithm. Section[5]compares 
these algorithms experimentally. 

4.5 Iterative Algorithm 

The dominator-based algorithm involves computation of 
local dominator sets which can be costly. By eliminat- 
ing the costly dominator computations, we devise another 
algorithm which is iterative in nature and is an attractive 
online algorithm. 

The main cost of the MSC algorithm is the skyline com- 
putation on the join of the two sets A' x and B[. This al- 
gorithm reduces the complexity of this cost by further di- 
viding the set A[ (B[) into local skylines A 2 (B 2 ) and 
non-skylines A' 2 (B' 2 )- Iteratively, this is proceeded until 
the cardinality of the non-skyline set is less than a preset 
threshold i5. The relation Aq (similarly, Bq) is thus subdi- 
vided into At, A 2 , . . . , A}., A' k , as shown in Figure[T] 

By observing certain relationships among these sets, we 
can determine that the dominators of the records of a set 
exist only in a few of the other sets, and it needs to be 
compared only with those sets. For example, a tuple in 
A 2 N B 2 needs to be compared with tuples in A\ X B\ 
only, thereby eliminating unnecessary comparisons with 
tuples in (At n B' x ) U (A[ m Si) U (A[ n B[). 

Lemma 2. A tuple in A 2 m B 2 can be dominated only 
by a tuple in At N Bt and not by any tuple in (At IX 
B[)U(A[ m B 1 )U(A[ m B[). 

Proof. Consider a tuple t' — u' m v' G A 2 n B 2 . Con- 
sider any other tuple t = u X v 6 A[ n B x . If t dom- 
inates t', then the l\ local attributes of t pertaining to u 
must dominate that of u' . However, since A 2 is in the lo- 
cal skyline set of A[, this contradicts the fact that no tuple 
in A[ locally dominates a tuple in A 2 . Similarly, no tuple 
in Ai n B[ or A[ n B[ can dominate t'. □ 

For each such set Ai x Bj, there exists target sets, 
within which it has to search for its dominators and test 
for the ASJQ requisites. We show the target sets up to 
two iterations in Table [5] 

The iterative algorithm is summarized in Algorithm [6] 
In each relation, the skyline sets are computed till the 
threshold 8 is reached. All combinations of such non- 
skyline sets are then joined, and the dominators for aggre- 
gates are checked only against the corresponding target 
sets. 
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Algorithm 6 Iterative Algorithm 



Input: Relations A, B, local preferences I, preferences p, 

aggregate operations a 
Output: Aggregate skyline join relation S 
Aq <— computeFullSkyline(A) 
Bq <— computeFullSkyline(S) 

i <- 1 

while < 5 do 

Ai + i <r- computeLocalSkyline(Ai) 
i «— % + 1 
end while 

La i I* Number of levels of skyline sets in A */ 



Set 


Target Sets 


A 2 m B 2 


Ax k B x 


A 2 xi B! 2 


Ax N Si, Ai n Si 


A' 2 xi B 2 


Ai X Si, A'i n Bx 


A' 2 m B 2 


A x n Bx, Ax n S(, Ai m Bx 



Table 5: Target sets for iterative algorithm. 



tuple dominates the other. In other words, A3 = A' 2 and 
S3 = B 2 , and the sets A 3 and S 3 are empty. Hence, this 
is considered as the last iteration. 



9: j <- 1 

10: while |Sj.| < <5 do 

11: S J+ i <— compute LocalSkyline(Sj) 

12: j «- j + 1 

13: end while 

14: Lb <— j I* Number of levels of skyline sets in B 
*/ 

15: J <— computeJoin(Ai, Si) U compute Join ( Ai, B[) 

U computeJoin(A'i, Si) 

16: G <- Aggregate( J, a) 

17: S^G 

18: i <— 1, j «- 1 

19: while j < La do 

20: while j < L B do 

21: J{- <- compute Join (A-,Sj) 

22: Gy Aggregate ( , a) 

23: 5 «- 5 U computeSkylineUsingTargetSets(Gy) 

24: j j + 1 

25: end while 

26: i «- i + 1 

27: end while 



The computeSkylineUsingTargetSets method men- 
tioned in the algorithm determines the skyline records in 
the set Sij by comparing only with the target sets corre- 
sponding to it as shown in Table [5] The first iteration of 
the iterative algorithm remains the same as in the MSC 
algorithm. In the second iteration, the sets A 2 and B 2 are 
joined and these are compared with only the target sets 
shown in Table [5] Similarly, in the next iteration, local 
skyline is further computed in A' 2 and B' 2 , and so on until 
the cardinality falls below the threshold 5. 

For the running example given in Table [TJ the break- 
up of the relations into the different sets Ax,A 2 , etc. are 
shown in Table [3] and Table |U Here, A' 2 and B' 2 are not 
further categorized, as they have only two tuples, and no 



4.5.1 Analysis 

The cost analysis of the iterative algorithm depends heav- 
ily on the cardinality of the non-skyline sets produced pro- 
gressively. The number of tuples that are joined remains 
the same as in the MSC approach. However, the ASJQ 
computation cost for the tuples in A\ n S^ reduces signif- 
icantly, since the search space for each tuple is iteratively 
pruned, and is thus, optimized. 

As a result, it performs significantly better in compar- 
ison to the other algorithms for datasets with large non- 
(full)skyline sets. This is due to the fact that the non- 
skyline sets are not blindly joined with each other, but 
rather only the relevant records are joined and compared 
in a progressive manner. This cuts down many unneces- 
sary skyline tests, thereby improving the efficiency. 

4.6 ASJQ with Single Aggregate Attribute 

A special case of the Aggregate Skyline Join Query is 
when it involves only a single aggregate attribute. The 
processing then becomes substantially easier. As shown 
in Section 14.21 the records which do not exist in the full 
skyline set of each relation (i.e., those in A' and B' a ) are 
discarded. However, when the number of aggregate at- 
tributes is one, even the tuples formed by joining Ao with 
So do not need to be examined. An interesting obser- 
vation, summarized in the following lemma, leads to the 
expeditious generation of the skyline points. The tuples in 
Ao xi So are guaranteed to be part of the final skyline set. 

Lemma 3. When there is only one aggregate attribute, 
the tuples formed by joining the full skyline points of each 
relation always exist in the ASJQ result set. 

Proof. Consider the set Ao (So) to be divided it into lo- 
cal skyline set Ai (Si) and non-skyline records A[ (B[). 
Using Theorem|2l the tuples in Ai n Si exist in the final 
skyline set. 
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Parameter 


Symbol 


Value 






TV 


D 


L 


G 


C 


Number of local attributes 


L 


2 




Setting 1 


10000 


Correlated 


2 


3 


10 


Number of aggregate attributes 


G 


2 




Setting 2 


10000 


Correlated 


3 


2 


10 


Cardinality of datasets 


N 


40000 




Setting 3 


3162 


Independent 


2 


2 


10 


Number of categories 


C 


10 




Setting 4 


316 


Anti-Correlated 


1 


2 


10 


Distribution of datasets 


D 


Correlated 




Setting 5 


316 


Independent 


2 


1 


10 



Table 6: Default parameters for synthetic data. Table 7: Experimental settings. 



Consider a tuple t' = u' M v' € A\ n B[ . We claim 
that there does not exist any tuple t = u H v that dom- 
inates t' fully. To counter the claim, assume that such a 
tuple t exists. Since t dominates t' , the local attributes of 
t must dominate those in t'. Thus, u y u' and v y v'. 
Next, consider the aggregate attribute of t', expressed as 
9t' = 9u' © g V ' ■ Note that since u' is a full skyline record, 
no tuple and in particular u, can dominate vf in all the 
attributes. That is to say, u' must dominate u in the aggre- 
gate attribute, since it is being dominated in all the other 
(local) attributes, i.e., g u > dominates g u . Similarly, g v > 
dominates g v . Since the aggregate function © is a mono- 
tone function, g# — g u i © g v > dominates g t = g u © g v . 
Therefore, the claim that t dominates t' fully is false. Con- 
sequently, the tuple t' must be in the final skyline set. 

Similarly, any tuple in (A[ M B\) U {A\ M B^) must 
also be a skyline record. Together, all the tuples in Aq m 
Bo exist in the ASJQ result set. □ 

Therefore, when there is only one aggregate attribute, 
an algorithm that divides the full skyline sets into local 
skylines and non-skylines, and returns the join of the two 
local skyline sets as the final ASJQ result, is the optimal 
algorithm. 

5 Experimental Evaluation 

In this section, we evaluate the ASJQ algorithms exper- 
imentally. We implemented them in Java on an Intel 
Core2Duo 2GHz machine with 2GB RAM in Linux 
environment. We used the synthetic dataset generator 
given in http://www.pgfoundry.org/projects/randdataset/ 
and used in Q. We also used a real dataset 
of statistics of basketball players obtained from 
http://www.databasebasketball.com/ For the skyline 
algorithm, we employed the SFS method [40, and used 
hash-join |[T4l for implementing the join. 

We analyze the execution times of the four algorithms: 
(1) Naive, (2) MSC, (3) Dominator-based, and (4) Itera- 

'The choice of SFS versus other algorithms such as LESS \7 \ does 
not matter as the focus is on the join and not the skyline computation. 



tive, based on the following parameters: (i) number of lo- 
cal attributes (L), (ii) number of aggregate attributes (G), 
(iii) cardinality of datasets (N), (iv) number of categories 
in each relation for joining attribute assuming equi-join 
(C), and (v) distribution of datasets (D). Unless men- 
tioned otherwise, the default settings of the five parame- 
ters for experiments with the synthetic data are given in 
Table 

5.1 Performance of the nai've algorithm 

The first experiment examines the difference in perfor- 
mance of the naive with the other ASJQ algorithms. We 
use five random settings of synthetic datasets as shown 
in Table [7] The plots in Figure [2] compare the execution 
times of the different algorithms. The join condition is an 
equi-join on a single attribute. 

For all the five settings, the naive algorithm requires 
much higher running times. Further, while the perfor- 
mance of the other algorithms depends on the final car- 
dinality of the ASJQ result set and is proportional to it, 
the naive algorithm is more or less independent of the fi- 
nal cardinality. This is due to the fact that it spends most 
of the time in computing the join of the relations and then 
applies the skyline operator on the large joined relation. 

Due to the large gap in the running times, we conclude 
that the naive algorithm is not practical in comparison to 
the other algorithms. Consequently, we do not report the 
results of the naive algorithm any further. 

5.2 Effect of dimensionality 

The first experiment measures the effect of the number of 
local attributes (L) on the algorithms. Figure [3^ a) shows 
that the running time increases sharply when L increases. 
This can be attributed to the fact that the cardinality of 
the ASJQ result set increases almost exponentially (Fig- 
ure Ob)). As the dimensionality of the datasets (i.e., the 
number of attributes) increases, the probability of a tuple 
being dominated in all the attributes decreases, thereby 
sharply increasing the number of skyline records. 
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Figure 2: Comparison with naive algorithm. 



Figure 4: Effect of number of aggregate attributes. 
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Figure 3: Effect of number of local attributes. 



Figure 5: Effect of dataset cardinality. 



The iterative algorithm shows the best scalability since 
it processes the skyline sets progressively. At lower di- 
mensions, the time to find the full skyline sets in the in- 
dividual relations is the dominating factor of the overall 
time, and hence, there is little difference between the al- 
gorithms. 

Figure Ua) and FigureHfb) show similar trends. Inter- 
estingly, the absolute times are much lower than the cor- 
responding number of local attributes. Incrementing the 
number of local attributes increases the dimensionality in 
the joined relation by two, whereas it only increases by 
one for aggregate attributes. Thus, the effect of dimen- 
sionality is less pronounced. Consequently, the cardinal- 
ity of the final ASJQ set is less. 

The MSC algorithm performs better than the 
dominator-based algorithm since the number of lo- 
cal attributes is small and the local dominator sets 
are larger. Consequently, the overhead of dominator 
computation and comparison offsets the advantages. 

5.3 Effect of dataset cardinality 

The next experiment measures the effect of the cardinality 
of the individual relations on ASJQ processing. The car- 



dinality of the joined relation increases quadratically with 
the individual cardinality, assuming that the data distribu- 
tion remains the same. For example, assume two datasets 
with N — 10 4 tuples each. If an equi-join condition is 
used where the number of categories of the joining at- 
tributes is assumed to be 10, each category has on an av- 
erage 10 3 tuples. Hence, the total cardinality of the joined 
relation becomes 10 x (10 3 ) 2 = 10 7 . 

Figure |3J however, shows that the cardinality of the 
ASJQ result set does not increase quadratically. (The fig- 
ure reports results for 4 local and 4 aggregate attributes. 
The cardinality and the running time for L = 2 and G = 2 
were too low.) The number of skyline records depends 
more on other parameters of the dataset, such as dimen- 
sionality and distribution. Consequently, the scalability of 
the ASJQ algorithms with N is better. 

5.4 Effect of dataset distribution 

We measured the effect of three standard data 
distributions — correlated, independent, and anti- 
correlated — on the ASJQ algorithms. The results 
are shown in Figure [6] The cardinality for the correlated 
dataset is very small, while that for the anti-correlated 
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Figure 6: Effect of dataset distribution. 
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Figure 7: Effect of number of categories of join attribute. 



dataset is quite large. In a perfectly correlated dataset, 
there is only one skyline record, which dominates all 
other records. In a perfectly anti-correlated dataset, 
every record is in the skyline set. The independent 
dataset is mid-way, and the cardinality depends on the 
dimensionality. This behavior is reflected in the results. 

For the correlated and the independent datasets, the run- 
ning times of the three algorithms are similar, while for 
the anti-correlated dataset, the iterative algorithm shows 
an advantage, as it processes the large dominator sets pro- 
gressively by only comparing it with certain target sets. 

5.5 Effect of number of categories of join at- 
tribute 

The final experiment on synthetic data measures the ef- 
fect of the number of categories of the join attribute. We 
assume that only one attribute used for joining the two 
relations, and the join condition is an equi-join. The num- 
ber of categories signifies the possible values of the join 
attribute. 

For datasets with cardinality N and number of cate- 
gories C, assuming an uniform distribution of the join 



attribute, the total cardinality of the joined relation is 
C x (N/C) 2 = N 2 /C. Hence, as C increases, the cardi- 
nality decreases. When C = 1, the join degenerates to a 
Cartesian product of the two relations with iV 2 tuples. 

The cardinality of the ASJQ, however, does not de- 
crease with C. As shown in Figure [3b), it attains a max- 
imum in the middle. When C is low, even though the 
number of tuples is high, the chance of a tuple dominat- 
ing others is higher as the join attribute is same for more 
number of tuples. At higher values of C, the number of 
joined tuples becomes small, leading to lower ASJQ car- 
dinality. 

Figure|7|a) shows that regardless of the cardinality, the 
running time increases with increasing C. When C is 
more, the initial full skyline sets (Ao and Bq) are larger as 
there is less probability of a tuple matching another tuple 
in the join attribute, and therefore, dominating it. Conse- 
quently, the latter stages of the algorithm are affected and 
the running time increases. 

5.6 Real Datasets 

In this section, we evaluate the performance of the ASJQ 
algorithms for a real dataset. The real dataset con- 
sists of the statistics of basketball players obtained from 
http://www.databasebasketball.com/. The cardinality of 
the dataset was N ~ 10 4 with 3 local attributes (L = 3) 
and 2 aggregate attributes (G = 2). We performed a self- 
join of the dataset with the join condition as equality. 

Four settings were created by varying the number of 
join attributes. In setting 1, year was used as the join at- 
tribute, while in setting 2, the dataset was joined on the 
team. For setting 3, no join attribute was used, which cor- 
responds to the Cartesian product of the relations. Setting 
4 used both the attributes for joining. 

The results are summarized in Figure[H] The cardinality 
of the final ASJQ result set was the highest when no join 
attribute was used (setting 3) and was lowest when both 
the attributes were used (setting 4). The running times 
reflected the trends of the cardinalities. The iterative al- 
gorithm performed the best, followed by the dominator- 
based approach. The MSC algorithm was the slowest. 
The strategy of the iterative algorithm to prune progres- 
sively proved to be the best. 

6 Conclusions 

In this paper, we have proposed a novel query, the AG- 
GREGATE Skyline Join Query (ASJQ). This extends 
the general skyline operator to multiple relations involv- 
ing joins using aggregate operations over attributes from 
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Figure 8: Real datasets. 

different relations. The ASJQ processing is explained 
with the MSC approach, dominator-based approach and 
the iterative approach, in addition to the naive algorithm. 
Extensive experiments confirm that our algorithms per- 
form well with real datasets, and also scale nicely with 
dimensionality and cardinality of the relations. In future, 
we would like to extend ASJQ to distributed environments 
and devise parallel algorithms to process the queries more 
efficiently. 
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